Online detecting end times of spoken utterances for synchronization of live speech and its transcripts

نویسندگان

  • Jie Gao
  • Qingwei Zhao
  • Yonghong Yan
چکیده

In this paper, we present our initial efforts in the task of Automatically Synchronizing live spoken Utterances with their Transcripts (textual contents) (ASUT). We address the problem of online detecting of the end time of a spoken utterance given its textual content, which is one of the key problems of the ASUT task. A frame-synchronous likelihood ratio test (FS-LRT) procedure is proposed and explored under the hidden Markov model (HMM) framework. The property of FS-LRT is studies empirically. Experiments indicate that our proposed approach shows satisfying performance. In addition, the proposed procedure has been successfully applied in a subtitling system for live broadcast news.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Study on Morpho-Syntactic Patterns: A Cohesive Device in Some Persian Live Sport Radio and TV Talks

Morpho-syntactic patterns device encompasses a subcategory of the cohesive devices that assists hearers to have an adequate mental representation for understanding speech. This article investigates the morpho-syntactic patterns employed in some Persian live sport radio and TV programs adapting Dooley and Levinsohn’s theoretical and analytical framework. The research data includes around 30,000 ...

متن کامل

Summarizing multiple spoken documents: finding evidence from untranscribed audio

This paper presents a model for summarizing multiple untranscribed spoken documents. Without assuming the availability of transcripts, the model modifies a recently proposed unsupervised algorithm to detect re-occurring acoustic patterns in speech and uses them to estimate similarities between utterances, which are in turn used to identify salient utterances and remove redundancies. This model ...

متن کامل

Detecting and extracting named entities from spontaneous speech in a mixed-initiative spoken dialogue context: How May I Help You?sm, tm

The understanding module of a spoken dialogue system must extract, from the speech recognizer output, the kind of request expressed by the caller (the call type) and its parameters (numerical expressions, time expressions or propernames). Such expressions are called Named Entities and their definitions can be either generic or linked to the dialogue application domain. Detecting and extracting ...

متن کامل

Construction of Back-Channel Utterance Corpus for Responsive Spoken Dialogue System Development

In spoken dialogues, if a spoken dialogue system does not respond at all during user’s utterances, the user might feel uneasy because the user does not know whether or not the system has recognized the utterances. In particular, back-channel utterances, which the system outputs as voices such as“yeah”and“uh huh”in English have important roles for a driver in in-car speech dialogues because the ...

متن کامل

An Algorithm for Extracting Similar Partial Utterances toward Flexible Spoken Document Retrieval

This paper proposes a new approach for spoken document retrieval by extracting similar partial utterances for non-segmented and non-recognized data; presentation speech, lecture speech or recorded video, and so on. For this purpose, we propose a new, efficient algorithm that performs fast matching between arbitrary sections of the database and arbitrary sections of query input. It enables searc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009